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Deterministic global optimization using space-filling curves 
and multiple estimates of Lipschitz and Holder constants 


Daniela Lera* and Yaroslav D. Sergeyev^^ 


Abstract 

In this paper, the global optimization problem minyg 5 -^(y) with S being 
a hyperinterval in and F{y) satisfying the Lipschitz condition with an 
unknown Lipschitz constant is considered. It is supposed that the function 
F{y) can be multiextremal, non-differentiable, and given as a ‘black-box’. 
To attack the problem, a new global optimization algorithm based on the 
following two ideas is proposed and studied both theoretically and numerically. 
First, the new algorithm uses numerical approximations to space-filling curves 
to reduce the original Lipschitz multi-dimensional problem to a univariate 
one satisfying the Holder condition. Second, the algorithm at each iteration 
applies a new geometric technique working with a number of possible Holder 
constants chosen from a set of values varying from zero to infinity showing so 
that ideas introduced in a popular DIRECT method can be used in the Holder 
global optimization. Convergence conditions of the resulting deterministic 
global optimization method are established. Numerical experiments carried 
out on several hundreds of test functions show quite a promising performance 
of the new algorithm in comparison with its direct competitors. 

Key Words. Global optimization, Lipschitz functions, space-filling curves. 
Holder functions, deterministic numerical algorithms, DIRECT, classes of test 
functions. 


1 Introduction 

Let us consider the following global optimization problem 

mm{F{y) : y e S = [a,b]}, (1.1) 
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where [a, b] is a hyperinterval in R^. It is snpposed that the objective fnnction 
F{y) can be mnltiextremal, possibly non-differentiable and it satishes the Lipschitz 
condition 

\F{y') - F{y")\ < L\\y'- y'% y',y" e [a,b], (1.2) 

with an unknown constant L, 0 < L < oo, in the Euclidean norm. This statement 
can very frequently be met in applications where each evaluation of F{y) can be very 
expensive from the computational point of view (see, e.g., PEiDroiDSiissiESisnisg, 
etc.). Due to this reason, in the literature there exist numerous methods dedicated to 
the problem (II.ip . (II.2p (together with references indicated above we can mention 
such recent publications as [21 HI |5l H?! [181 iSSl [271 SSI Hi])- If should be also 
mentioned that methods for problems where the gradient of the objective function 
satishes the Lipschitz condition were also studied (see, e.g., |ll[l3l[2ni[2l[3ll[35l[37], 
etc.). 

One of the main issues regarding the problem fll.ip . fll.2p is related to the treat¬ 
ment of the Lipschitz constant L. In the literature, there exist several approaches 
for acquiring the Lipschitz information that can be distinguished with respect to 
the way the Lipschitz constant is estimated during the process of optimization. For 
instance, there exist algorithms (see, e.g., n HDl mi EZl [2S]) that use an a priori 
given estimate L of L (it should be mentioned that usually in practice it is difficult 
to obtain valid estimates) or an adaptive estimate Li that is recalculated at each 
iteration i of the method (see, e.g., [281 EH [381 00]). 

It is well known that estimates of the Lipschitz constant have a signihcant in¬ 
fluence on the convergence and the speed of the global optimization algorithms (see 
[SZl ESI SO]). Namely, tight estimates can accelerate the search, overestimates can 
slow it down, and underestimates can lead to losing the global solution. Algorithms 
that use in their work a global estimate Lj or an a priori given estimate L do not take 
into account any local information about the behavior of the objective function over 
small subregions of the domain S. It has been shown in [161 ISIl ESI E21 EH EH 00] 
that a smart usage of local estimates Li{Dj) of the local Lipschitz constants L{Dj) 
over subregions Dj C S allows one to signihcantly accelerate the global search. 
Naturally, a balancing between the local and global information must be performed 
in an appropriate way in order to avoid missing the global solution. Another in¬ 
teresting approach that has been introduced in [TS] uses at each iteration several 
estimates of the Lipschitz constant L. This algorithm works by partioning the hy¬ 
perinterval S in subintervals (called hereinafter also subboxes) and due to this reason 
it has been called by its authors DIRECT (Dividing RECTangles). 

We can see therefore that in the literature there exist four practical strategies 
to obtain a Lipschitz information for its subsequent usage in numerical methods: 
a) to consider the real constant L, if it is given, or to use its overestimate when it is 
possible; b) to calculate dynamically a global a global (i.e., the same for the whole 
search region) adaptive estimate of L; c) to consider estimates of local Lipschitz 
constants related to sub intervals of the search region S'; d) to take into consideration 
a set of estimates of L selected among all possible values varying from zero to inhnity. 
In this paper, our attention will be dedicated to this fourth alternative. 
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Figure 1: Approximation of the Peano curve (Hilbert’s version of its construction is 
used here) in dimension N=2. 


As was mentioned above, in the literature there exists a variety of numerical 
methods dedicated to the problem fll.ip . fll.2p . One of non trivial views on the 
problem consists of the reduction of the dimension of fll.ip . fll.2p with the help of 
space-hlling curves. These curves, hrst introduced by Peano (1890) and Hilbert 
(1891) (see [H EB ESI ESI SHI E]), emerge as the limit objects generated by an 
iterative process. They are fractals constructed using the principle of self-similarity. 
Peano-Hilbert curves £11 in the hypercube [a, 6] C R^, i.e., they pass through every 
point of [a, 6], and this gave rise to the term space-filling curves. Examples of 
construction of these curves in two dimensions are given in Fig. [1] 

It has been shown (see [I] ) [M] , |1D] ) that, by using space filling curves, the mul¬ 
tidimensional global minimization problem fll.ip . fll.2p can be turned into a one¬ 
dimensional problem. In particular, Strongin has proved in [39] that finding the 
global minimum of the Lipschitz function F{y),y G , over a hypercube is equiv¬ 
alent to determining the global minimum of the function f{x) in an interval, i.e., it 
follows 

f{x) = F{p{x)), xe[0,l], (1.3) 

where p{x) is the Peano curve. Moreover, the Holder condition 

|/(x') - f{x'')\ < H\x' - x''\^^^, x',x'' e [0,1], (1.4) 

holds (see jlO]) for the function f{x) with the constant 

H = 2L^/WT3, (1.5) 
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where L is the Lipschitz constant of the original multidimensional function F{y) 
from OLIP ' (|1.2p . 

Thus, one can try to solve the problem fll.ip . fll.2p by using algorithms proposed 
for minimizing Holderian functions fll.Sp . fll.dp in one dimension. To do this in prac¬ 
tice, three main steps should be executed if one wishes to use methods that partition 
the search region and try to obtain and to use a Lipschitz/Holder information (see 
[2HI Eal [37] for a general description of this kind of methods, their convergence 
properties, etc.). First, in order to realize the passage from the multi-dimensional 
problem to the one-dimensional one, computable approximations to the Peano curve 
should be employed in the numerical algorithms. Second, the Holder constant H 
from fll.4l) should be estimated. Finally, a partition strategy of the search region 
should be chosen. 

We have already seen above that in the Lipschitz global optimization there exist 
at least four ways to obtain estimates of the Lipschitz constant. When we move 
to the Holder global optimization the situation is different. In the literature (see 
[m EH EH EH EH ESI), there exist methods that use strategies a), b), and c) 
discussed above. However, inventing for the Holder global optimization a strategy 
similar to the technique d) was an open problem since 1993 when the article [T5] 
showing how to use simultaneously several estimates of L has been published. In 
this paper, we close this gap and propose an algorithm that uses several estimates 
of the Holder constant at each iteration employing space-hlling curves, central point 
based partition strategies, and Holderian minorants. 

The rest of the paper has the following structure. In Section 2, difficulties re¬ 
garding the usage of the strategy d) in the Holderian framework and the proposed 
solution are presented. In Section 3, a new algorithm for solving the problem fll.ll) . 
fll.2p and its convergence properties are described. The new method uses numerical 
approximations to Peano space-filling curves and the scheme of representation of 
intervals with Holderian minorants from Section 2. Section 4 presents results of nu¬ 
merical experiments that compare the new method with its competitors on 800 test 
functions randomly generated by the GKLS-generator from [9]. Finally, Section 5 
contains a brief conclusion. 


2 Lipschitz and Holder minorants in one dimen¬ 
sion 

Let us consider a one-dimensional function f{x) satisfying the Lipschitz condition 
with a constant L over the interval [0,1]. Then it follows from the Lipschitz condition 
that 

f{x) > C^{x) = Ci{x), X e [oi, bi], I <i <k, 

c~{x) = f{mi) - L{nii - x), x e [ai,mi\, 

cfix) = f{mi) - L{x-TUi), xe [mi,bi\, 



( 2 . 1 ) 

( 2 . 2 ) 
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Figure 2: Lipschitz (left) and Holder (right) support functions. 


where C^{x) is (see Fig. [21 left) a piece-wise linear discontinuous minorant (called 
often also support function) for f{x) over each subinterval di = [aj,6j], 1 < f < fc, 
and 

rui = {ai + hi)/2. (2.3) 

The values i?*, 1 < i < A;, called characteristics and being lower bounds for the 
function f{x) over each interval dj, 1 < i < fc, can easily be calculated. In fact, if 
we suppose that an overestimate Li > L of the Lipschitz constant L is given, then 
it follows 

Ri = Ri{Li) = min Ci{x) = /(m^) - 0.5Li{bi - a^). (2.4) 

x£[ai,bi] 

However, in order to solve the multidimensional problem fll.ip . fll.2p by using 
space-hlling curves, instead of working with Lipschitz functions, we should focus 
our attention on the one-dimensional Holderian function f{x) from fll.3p . It follows 
from fll.dp that, for all x^z E [0,1] we have 

f{.x)>f{z)—H\x — z(-^^. (2.5) 

If a point z E [0,1] is hxed, then the function 

Z{x) = f{z) — H\x — z\^^^ 


is a minorant for f{x) over [0,1]. Then, analogously to fl2.ip - fl2.3p . we obtain that 
the function 

Z^{x) = li{x), X E [ai,bi\, I <i <k, (2.6) 

_ j = fi^i) - H{mi-x)^/^, xE[ai,mi], , . 

1 Ifix) = f{mi) - H{x-mi)^l^, xE [mi,bi\, 


is a discontinuous nonlinear minorant for f{x) (see Fig. [21 right). The values Ri, 
1 < i < k, are lower bounds for the function f{x) over each interval di, 1 < i < k, 
and can be calculated as follows if an overestimate Hi > H of the Holder constant 
H is given 


Ri = Ri{Hi) = min li{x) 

xe[ai,bi] 


f{mi) - Hi\{bi - ai)/2|^/^. 


( 2 . 8 ) 
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Figure 3: Representation of intervals in the Euclidean metric in the framework of 
the DIRECT algorithm. 


As it was discussed above, the DIRECT algorithm works simultaneously with 
several estimates of the Lipschitz constant at each iteration. One of the key features 
that allow it to do this is a smart representation of intervals [oj, 6j], 1 < z < /c, in the 
two-dimensional diagram shown in Fig. [3l In this Figure, we have intervals d^, 
dc, do, and dg that are represented by the dots A, B, C, D, and E, respectively. 
The horizontal coordinate of each dot is equal to 0.5(6j — Oj), i.e., half of the length 
of the respective interval [a^, bi], and the vertical coordinate is equal to f{mi) where 
mi is from (1231) (see, e.g., the dot B and its coordinates). Let us consider now 
the intersection of the vertical coordinate axis with the line having the slope Li and 
passing through each dot representing subintervals in the diagram shown in Fig. |3l It 
is possible to see that this intersection gives us exactly the characteristic Ri = Ri{Li) 
from fl2.4p . i.e., the lower bound for /(x) over the corresponding subinterval [a*, 6*] 
if Li > L (note that the points on the vertical axis (dj = 0) do not represent any 
subinterval). 

It can immediately be seen that the diagram allows one to determine very easily 
an interval with the minimal characteristic (in Fig. [3] this interval is represented 
by the dot E, its characteristic is Re{Li)). In the Lipschitz global optimization 
(see, e.g., [ISlIMlEZlEHllin]), the operation of finding an interval with the minimal 
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characteristic (that can be calculated in different ways in different algorithms) is an 
important one. It is executed at each iteration, the found interval is then subdivided 
and f{x) is evaluated at new points belonging to this interval. Moreover, since we 
do not know the exact value of the real Lipschitz constant L, the scheme presented 
in Fig. [3] allows one (see [15]) to take into consideration all possible values of L from 
zero to infinitjli] and to choose a set of promising intervals for partitioning (this issue 
will be discussed in detail later). 

Let us try now to construct a similar diagram in the framework of the Holder 
optimization where the nonlinear support functions li{x) from fl2.7p shown in Fig. |21 
right, are built over each sub interval. In Fig. 01 intervals ds, dc, d^j, and dg are 
again represented by dots A, B, C, D, and E, respectively. If we take an estimate Hi 
of the Holder constant H, then characteristic Rb{Hi) of the interval ds represented 
by the dot B is given by the vertical coordinate of the intersection point of the 
auxiliary curve fl2.7p passed through the point B and the vertical coordinate axis. 

We can see in Fig. 0]that the curves constructed using the estimate Hi and rep¬ 
resenting the nonlinear support function fl2.7p can intersect each other in different 
ways. The procedure of the selection of subintervals for producing new trials {trial 
is an evaluation of f{x) at a point x that is called trial point) is based on estimates 
of the lower bounds of the objective function over current subboxes. Subintervals 
for the further subdivision are chosen taking into account all possible values of the 
Holder constant H. Due to numerous possible intersections of the curves at the rep¬ 
resentation shown in Fig. 01 it becomes unclear how to select, for all possible values 
of id, the set of promising intervals that must be partitioned at each iteration k. 

This difficulty that seemed unsolvable since 1993 did not allow people to con¬ 
struct global optimization algorithms working in the framework of the Holder global 
optimization with all possible Holder constants. This problem is solved in the next 
section that proposes an algorithm that uses, on the one hand, Peano space-hlling 
curves to reduce the multi-dimensional problem fll.ip . fll.2p to one dimension and, 
on the other hand, multiple estimates of the Holder (Lipschitz) constant. The algo¬ 
rithm for solving one-dimensional Holder global optimization problems in the spirit 
of the DIRECT method is a part of it. 


3 The new algorithm 

In order to construct a procedure allowing one to select, at each iteration k of the 
algorithm, a set of intervals to be partitioned with respect to all possible values of the 

^In fact, it is easy to see that if we connect the points A and i? by a line and indicate its slope 
by then for all constants L > the interval dA will have the minimal characteristic and, 

therefore it should be subdivided. Analogously, for all constants L < the interval ds will 

have the minimal characteristic. Then, if we subdivide both intervals, dA and dE, then the interval 
corresponding to the real constant L will be subdivided even though we are not know this value L. 
Thus, the diagram really allows one to consider all possible values of the Lipschitz constant from 
zero to infinity and to choose a small group of intervals for the subsequent subdivision ensuring 
that the interval corresponding to the correct Lipschitz constant belongs to this group. 
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Figure 4: Due to numerous possible intersections of nonlinear Holderian minorants, 
the attempt to use the representation of intervals analogous to that shown in Fig. 0 
does not give the desired effect. 

constant H we proceed as follows. First, we introduce a new graphical representation 
of subintervals di by using instead of Euclidean metric the Holderian one. Namely, we 
propose to represent each interval di = [a,, 6*] G {D’^} by a dot P, with coordinates 
{hi,Fi), where is the current partition of the one-dimensional search interval 
during the fcth iteration and coordinates of the point Pj are calculated as follows 

h, = |(6,-a,)/2|V^, (3.1) 

Fi = f{mi), (3.2) 

where rui is from fl2.3p . i.e., m, is the central point of the interval di = [a*, bi]. 

The introduction of the Holderian metric allows us (see Fig. |5]) to avoid the 
non-linearity and the intersection of minorants giving as a result a diagram similar 
to that from Fig. [31 In Fig. Owe can see the representation of the same intervals 
considered in Fig. O Notice that in Fig. 0 the values in the horizontal axis are 
calculated in the Holderian metric, while the vertical axis values coincide with those 
of Fig. O In this new representation, the intersection of the line with the slope Hi 
passing through any dot representing an interval in the diagram of Fig. [5] and the 
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Figure 5: Representation of intervals with the Holderian metric. 


vertical coordinate axis gives us exactly the characteristic fl2.8p of the corresponding 
interval. 

We can proceed now with the development of the new one-dimensional global 
optimization method following the spirit of the DIRECT algorithm and keeping 
in mind that we deal with the Holderian metric. Each subinterval di of a current 
partition is characterized by a lower bound of the objective function over dj. 
An interval di is selected for a further partitioning if for some value id > 0 of the 
possible Holder constant it has the smallest lower bound for /(x) with respect to 
the other intervals present at this iteration. By changing the value of H from zero 
to inhnity, at each iteration fc, we select a set of intervals that will be partitioned. 

Let us consider how this set of intervals is chosen during each iteration k. The 
following dehnitions state a relation of domination between every two subintervals 
of the current partition of D. 

Definition 3.1 Given an estimate H > 0 of the Holder constant H from ^1-41 ), 
interval di G dominates the interval dj G with respect to H if 


R,{H) < R0). 
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Definition 3.2 An interval dt G {-D^} is said to he nondominated with respect to 
H > 0 if for the chosen value H there is no other interval in which domi¬ 

nates dt- 

In order to be sure to subdivide the nondominated interval corresponding to the 
real constant H, we can select the set of nondominated intervals with respect to all 
possible estimates 0 < H < oo. By using the new graphical representation shown 
in Fig. Oit is easy to determine whether an interval dominates, with respect to an 
estimate of the Holder constant H, some other interval from the partition {D^}. 
For example, in Fig. Owe can see that for the estimate Hi we have 


Rd{Hi) < RciHi) < Rb{Hi), 

so, the interval djD dominates both intervals dc and ds with respect to Hi. Further¬ 
more, Ra{Hi) < Rb{Hi), i.e., the interval dominates the interval d^- Finally, 
the characteristic Re{Hi) is the smallest one, and the interval dg dominates all 
others intervals with respect to Hi, i.e., it is nondominated with respect to Hi (see 

Fig. [5]). 

If a different value H 2 > Hi of the Holder constant is considered (see Fig. [5]), the 
interval do still dominates the interval ds with respect to H 2 , because Rd{H 2 ) < 
Rb{H 2 ), but d^) is dominated by the interval dc, since Rd{H 2 ) > Rc{H 2 ). Moreover, 
we have that Ra{H 2 ) < Re{H 2 ) < Rc{H 2 ), thus, for the chosen estimate H 2 the 
unique nondominated interval is d^. As we can see from this example, some intervals 
(ds, dc, and d^i in Fig. E]) are always dominated by other intervals, independently 
on the chosen estimate of the Holder constant. Other intervals (d^; and d^ in Fig. [5]) 
can be nondominated for one value and dominated for another one. The following 
definition will be very useful hereinafter. 

Definition 3.3 An interval dt G {D^} is called nondominated if there exists an 
estimate 0 < H < 00 of the Holder constant H such that dt is nondominated with 
respect to H. 

In other words, nondominated intervals are intervals over which the objective 
function f{x) has the smallest lower bound from fl2.8p for some particular estimate 
of the Holder constant H. Note that in the two-dimensional diagram {hi, Ff), where 
hi and Ft are from fl3.ip . fl3.2p . the nondominated intervals are located at the bottom 
of each group of points with the same horizontal coordinate. For example in Fig. [6] 
these points are designated as A, B, C, D, E, F, and G. Not all of these intervals 
are nondominated: in fact, in Figure [6] the interval dc is dominated either by the 
interval ds (for example, with respect to Hi > Hbd, where Hbd corresponds to the 
slope of the line passed through the points B and D), or by the interval dci (with 
respect to H 2 < Hbd)- The interval dc is dominated by do and dE, with respect 
to any positive estimate of the constant H. The interval dc is dominated by dF- 
In Figure [HI dots A, B, D, and E represent nondominated intervals. The following 
theorem allow us to easily identify the set of nondominated intervals. 
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Figure 6: The nondominated intervals dA, ds, do, and dE are represented by dots 
A, B, D, and E. 


Theorem 3.1 Let each interval di = [aj,6j] E {D^} he represented by a dot with 
horizontal coordinate hi and vertical coordinate Fi defined in Then, 

intervals that are nondominated in the sense of Definition \3.3\ are located on the 
lower-right convex hull of the set of dots representing the intervals. 

Proof. The proof of the Theorem 13.11 is analogous to the proof of Theorem 2.2 
from |36] . □ 

In practice, nondominated intervals can be found by applying algorithms for 
identifying the convex hull of the dots (see, e.g., the algorithm called Jarvis march, 
or gift wrapping, see [5D]h 

We describe now the partition strategy adopted by the new algorithm for dividing 
subintervals in order to produce new trial points. When, at the generic iteration 
k, we identify the set of nondominated intervals, we proceed with the subdivision 
of each of these intervals only if a signihcant improvement on the function values 
with respect to the current minimal value fminik) is expected. Once an interval dt E 
becomes nondominated, it can be subdivided only if the following condition 
is satished: 

Rt{H) < frmnik) - (3.3) 
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where the lower bound Rt = Rt{H) is from fl2.8p . This condition prevents the 
algorithm from subdividing already well-explored small subintervals. 

Let us suppose now that at the current iteration k of the new algorithm a subin¬ 
terval dt = [oj, 6i], represented in the two-dimensional diagram of Fig. [6] by the dot 
from flS.lH . fl3.2p . has been chosen for partitioning. The subdivision of this 
interval is performed in such a way that three new equal subintervals of the length 
{bt — at)/?) are created, i.e., 


[at,b,] = [ai,pt] U Ipt,qt] U 
Pf ^ + {bt - at)/3, qt^bt- [bt - nf)/3. 


(3.4) 

(3.5) 


The interval dt is removed from the two-dimensional diagram representing the cur¬ 
rent partition of the search interval, and the three newly generated subinter¬ 

vals are introduced both into {D’^} and the diagram. Finally, two new trials /(ci) 
and /(C 2 ) are performed at the points Ci and C 2 of the intervals [at,pt\ and [qt,bt\, 
respectively, i.e.. 


Cl 


at+Pt 

2 


C2 


Qt + bt 
2 


(3.6) 


The central interval [pt, qt] inherits the point = vt+qt objective 

function has been evaluated when the original interval dt = [oi, bt\ has been created. 

Until now we have described the strategies assuming to work with a function 
in one dimension. As was stated above, Strongin has shown that multidimensional 
optimization problems can be solved by using modihed algorithms proposed for 
minimizing functions in one dimension, and therefore in order to solve the global 
optimization problem in N dimensions fll.ip . fll.2p we can use the above developed 
one-dimensional global optimization method together with the space-hlling curves. 
For an effective use of the Peano curve in our algorithm we need computable ap¬ 
proximations of the curve (see [38l 00] for a detailed discussion and a code allowing 
one to implement such approximations). Hereinafter we denote by Pm{x) the ap¬ 
proximation of level M of the Peano curve. In Fig. [T]we can see examples of Peano 
curve approximations of the levels M = 2, 3,4, 5 in dimension N = 2. 

Suppose now that a global optimization method uses an approximation Pm{x) of 
the Peano curve to solve the multidimensional problem and provides a lower bound 
for the corresponding one-dimensional function f{x). Then the value will 
be a lower bound for the function Fiy) in dimension N only along the curve Pm{x). 
The following theorem establishes a lower bound for the function F{y) over the 
entire multidimensional search region [a, b] given the value f/^ . 


Theorem 3.2 Let Ulj he a lower hound along the space-filling curve Pm{x) for a 
multidimensional function F{y), y G [a,b] C R^, satisfying Lipschitz condition with 
constant L, i.e., 

Um<F{pm{x)), xe[0,1]. 

Then the value 

U = Um- 2-(^+^)L\/iV 
is a lower hound for F{y) over the entire region [a, b]. 


12 













Proof. See [23] or the recent monograph [38]. □ 

By using the space-hlling curves we are able to work with a one-dimensional 
function in the interval [0, 1] <Z R^. The level M of the approximation of the Peano 
curve pm{x), is crucial for a good performance of the method. If M is too small, 
the domain in N dimensions may not be well “hlled” and we risk losing the global 
solution. When M grows, the reduced function in one dimension becomes more and 
more oscillating and the number of local minima increases when N increases (see 
[2B] for a detailed discussion on this topic). Then, due to the facts that we are in 
[0,1] and that we use the metric of Holder, it happens that the width of the non- 
dominated interval dt G {D’^} to be partitioned at a generic iteration k can become 
very small. When the dimension N increases, the width of the subintervals can 
reach the computer precision. In order to avoid this situation another condition in 
addition to fl3.3p is required. Namely, when an interval dt = [at, bt] G {D’^} becomes 
nondominated, it can be subdivided only if the following condition is satished: 


bt- at> r], 


(3.7) 


where p is a parameter of the method. 

Now, let us present the new algorithm called MG AS (Multidimensional Global 
optimization Algorithm working with a Set of estimates of the Holder constant). 


Algorithm MGAS 

Step 0. (Initialization). Set the current iteration number k ■= 1. 

Split the initial interval D = [0,1] in three equal parts and set = 1/Q, = 

1/2, = 5/6 and compute the values of the function = f{x^) = F{pm{x^)), 

j = 1, 2, 3, where Pm{x) is the M-approximation of the Peano curve. 

Set the current partition of the search interval = {[0,1/3], [1/3, 2/3], [2/3,1]}. 

Set the current number of intervals J = 3 and the current number of trials 
T = 3. 

Set /min(l) = and Xmm(l) = argmin{f{x^) : j = 1,2,3}. 

After executing k iterations, the iteration k + 1 consists of the following steps. 

Step 1. (Nondominated intervals) Identify both the set S^, C D^, of nondom¬ 
inated intervals, according to Dehnition 13.31 that satisfy conditions fl3.3l) and 
fl3.7p . and the corresponding set of their indices. denotes the partition 
of the search interval D = [0,1] at iteration k. 

Step 2. (Subdivision of nondominated intervals) Set = D^, and perform the 
following Steps 2.1-2.3: 


13 








Step 2.1 (Interval selection). Select a new interval dt = [at,bt] from such 
that 

t = arqmax\bi — aA. 

Step 2.2 (Subdivision and sampling). Subdivide interval dt in three new 
equal subintervals, named dn, dt 2 , dts, of the length {pt — at)/3, following 
(I3.4p . (I3.5p . and produce two new trial points accordingly to (13.6p . 

Eliminate the interval dt from i.e., set \ {dt}, and update 

with the insertion of the three new intervals, i.e., 

U Ki} U K 2 } U Ka}. 

Increase both the current number of intervals J = J + 2, and the current 
number of trials T = T + 2. 

Update the current record fmin and the current record point Xmin if necessary. 

Step 2.3 (Next interval). Eliminate the interval dt from S^, i.e., set 
S'^ = S'^\ {dt} and A = A\ {t}. 

If A 0, then go to Step 2 . 1 . Otherwise go to Step 3. 

Step 3. (End of the current iteration). Increase the iteration counter fc = fc + 1. 
Go to Step 1 and start the next iteration. 

Different stopping criteria can be used in the algorithm introduced above. One 
of them will be introduced in the next section presenting numerical experiments. 

We proceed now to the study of convergence properties of the algorithm. Theo¬ 
rem [3]2] linking the multidimensional global optimization problem fll.ip . fll.2p to the 
one-dimensional problem fll.3p . fll.4p allows us to concentrate our attention on the 
convergence properties of the one-dimensional method. We shall study properties of 
an inhnite sequence {x^^^'^} of trial points generated by the algorithm MGAS when 
we suppose that the number of iteration k goes to inhnity (i.e., in this case the al¬ 
gorithm does not stop). The following theorem establishes the so-called everywhere 
dense convergence of the method, i.e., convergence of the inhnite sequence of trial 
points to any point of the one-dimensional search domain. 

Theorem 3.3 If 7] = 0 m then for any point x E D = [0,1] and any 5 > 0 

there exist an iteration number k{5) > 1 and a point x' G {x^^^^}, k > k{5), such 
that \x — x'\ <5. 

Proof The interval partition scheme fl3.4p . fl3.5p used for each subdivision of intervals 
produces three new subintervals of the length equal to a third of the length of the 
subdivided interval. Since 77 = 0, to prove the Theorem it is sufficient to prove that 
for a hxed value 5 > 0, after a hnite number of iterations k{d), the largest subinterval 
of the current partition {D^^^^} of the domain D will have the length smaller than 5. 
In such a case, in the ^-neighborhood of any point of D there will exist at least one 
trial point generated by the algorithm. □ 
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To see this, let us fix an iteration number k' and consider the group of the 
largest intervals of the partition {D^ } having the horizontal coordinate hmax (in 
the diagram of Fig. [6] this group consists of two points: the dot A and the dot 
above it). As can be seen from the scheme of the algorithm MGAS, for any k' >1 
this group is always taken into account when nondominated intervals are looked 
for. In particular, an interval dt G {D^ } from this group, having the smallest value 
Ft, must be partitioned at each iteration of the algorithm. This happens because 
there always exists a sufficiently large estimate of the Holder constant H for the 
function f{x) such that the interval dt is the nondominated interval with respect to 
Hoo and condition fl3.3p is satisfied for the lower bound Rt{Hao)- 

Three new subintervals having the length equal to a third of the length of dt 
are then inserted into the group with a horizontal coordinate hj < hmax- Since 
each group contains a finite number of intervals, after a sufficiently large number 
of iterations all the intervals of the group hmax will be divided and the group will 
become empty. As a consequence, the group of the largest intervals will now be 
identified by hj, where the difference hmax — hj > Q is finite. The same procedure 
will be repeated with this new group of the largest intervals, and the next new group, 
etc. 

This means that there exists a finite iteration number k{d) such that after per¬ 
forming k{5) iterations of the algorithm MGAS, the length of the largest interval of 
the current partition is smaller than 6 and, therefore, in the ^-neighborhood 

of any point of the search region there will exist at least one trial point generated 
by the algorithm. 

In Fig. [TJ an example of convergence of the sequence of trial points generated 
by the algorithm MGAS in dimension N = 2 using the approximation of the level 
M = 5 to the Peano curve is given. The zone with the high density of the trial 
points corresponds to the global minimizer. 

Fig. |8] shows how this problem was solved in the one-dimensional space. In the 
upper part of Fig. [HI the one-dimensional function corresponding to the curve shown 
in Fig. [Tjand the respective trial points produced by the MGAS at the interval [0,1] 
are presented. The lower part of the Figure shows the dynamics (from bottom to 
top) of 40 iterations executed by the MGAS. It can be seen that each iteration 
contains more than one trial. The piece-wise line connects points with the best 
function value obtained during that iteration. 

In order to conclude this section it should be noticed that Theorem 13.31 estab¬ 
lishes convergence conditions of infinite sequences of trial points generated by the 
algorithm MGAS to any point of the domain [0,1] and therefore to the global min¬ 
imum points X* of the one-dimensional function f{x). The Peano curves used for 
reduction of dimensionality establish a correspondence between subintervals of the 
curve and the A^-dimensional subcubes of the domain [a, b] C . Every point 
on the curve approximates an ^-neighborhood in [a,b], i.e., the points in the N- 
dimensional domain may be approximated differently by the points on the curve in 
dependence on the mutual disposition between the curve and the point in [a, b] to 
be approximated (see [381110]). Here by approximation of a point y G [a, b] we mean 
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Figure 7: Trial points produced by the MG AS and Peano curve approximation of level 
5 while optimizing Function no. 6, class 1, generated by the GKLS-generator; Table 
[3 describes this and other classes of test functions used in the numerical experiments. 

the set of points (called images) on the curve minimizing the Euclidean distance 
from y. It was shown in [381 SO] that the number of the images ranges between 1 
and 2^. These images can be located on the curve very far from each other despite 
their proximity in the dimensional space. Thus, by using the space-hlling Peano 
curve pm{x), the global minimizer y* in the iV-dimensional space can have up to 2^ 
images on the curve, i.e., it is approximated by n, 1 < u < 2^, points y** such that 

y*^=PM{x*^), \\y*^-y*\\<e, I < Z < U, 

where £ > 0 is dehned by the space-hlling curve. Obviously, in the limiting case, 
when M — )■ CX3 and the iteration number /c — )■ cxo, all global minimizers will be found. 
But in practice we work with a hnite M < oo and k < oo, i.e., with a hnite trial 
sequence, then to obtain an £-approximation y*^ of the solution y* it is sufficient to 
hnd only one of the images x*® on the curve. This effect may result in a serious 
acceleration of the search (see [10] for a detailed discussion). 


4 Numerical experiments 

In this section, we present results of numerical experiments performed to compare 
the new algorithm MGAS with the original DIRECT algorithm proposed in [15] and 
its locally biased modihcation LBDirect introduced in [?[ [S|. These methods have 
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Figure 8: The one-dimensional function corresponding to the curve shown in Fig. 
and the respective trial points produced by the MG AS at the interval [0,1]; The lower 
part of the Figure shows the dynamics of fO iterations executed by the MGAS. 

been chosen for comparison because they, just as the MGAS method, do not require 
the knowledge of the objective function gradient and work with several Lipschitz 
constants simultaneously. The Fortran implementation of the two methods described 
in [7] and downloadable from [6] have been used in both methods. 

To execute numerical experiments with the algorithm MGAS, we should dehne 
its parameter ^ from fl3.3p . In DIREGT (see [IS]), where a similar parameter is used, 
the value f is related to the current minimal function value fmin{k) and is hxed as 
follows: 

^ = e|/mm(fc)|, e > 0. (4.1) 

The choice of e between 10“^ and 10“^ has demonstrated good results for DIREGT 
on a set of test functions (see US]). Since the value e = 10 ^ has produced the 
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Figure 9: A function produced by the GKLS-generator shown together with a 
piecewise-linear approximation to Peano curve used for optimization. 

most robust results for DIRECT (see, e.g., EEins]), exactly this value was used in 
fl4.ip for DIRECT in our experiments. The same formula fl4.1l) and the same value 
e = 10“'^ were used in the new algorithm, too. 

The series of experiments involves a total of 800 test functions in the dimensions 
N = 2,3,4,5 generated by the GKLS-generator described in [9] and download¬ 
able from http://wwwinfo.deis.unical.it/~yaro/GKLS.html. More precisely, eight 
classes of 100 functions have been considered. The generator allows one to con¬ 
struct classes of randomly generated multidimensional and multiextremal test func¬ 
tions with known values of local and global minima and their locations. Each test 
class contains 100 functions and only the following Eve parameters should be defined 
by the user: 

N - problem dimension; 
m - number of local minima; 
f* - value of the global minima; 

r* - radius of the attraction region of the global minimizer; 
d - distance from the global minimizer to the vertex of the paraboloid. 

The generator works by constructing in a convex quadratic function, i.e., a 
paraboloid, systematically distorted by polynomials. In our numerical experiments 
we have considered classes of continuously differentiable test functions with m = 10 
local minima. The global minimum value f* has been hxed equal to —1.0 for all 
classes. An example of a function generated by the GKLS can be seen in Fig. O 
By changing the user-dehned parameters, classes with different properties can be 
created. For example, a more difficult test class can be obtained either by decreasing 
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Class 

Difficulty 

N 

m 

r 

d 

r* 

1 

Simple 

2 

10 

-1.0 

0.90 

0.20 

2 

Hard 

2 

10 

-1.0 

0.90 

0.10 

3 

Simple 

3 

10 

-1.0 

0.66 

0.20 

4 

Hard 

3 

10 

-1.0 

0.90 

0.20 

5 

Simple 

4 

10 

-1.0 

0.66 

0.20 

6 

Hard 

4 

10 

-1.0 

0.90 

0.20 

7 

Simple 

5 

10 

-1.0 

0.90 

0.40 

8 

Hard 

5 

10 

-1.0 

0.90 

0.30 


Table 1: Description of 8 classes of test functions used in experiments 


Class 1 

Class 5 

r 

Average 

Maximal 

n 

Average 

Maximal 

10-4 

174.24 

565 

10-® 

12174.20 

171561 

10-® 

227.60 

889 

10-10 

10674.30 

95467 

10“® 

268.98 

1279 

10-12 

15145.12 

143075 


Table 2: Sensitivity analysis for the parameter rj from \3. 7p 

the radius r* of the attraction region of the global minimizer or by increasing the 
distance, d, from the global minimizer to the paraboloid vertex. In this paper, for 
each dimension N = 2,3,4, 5, two test classes where considered: a simple one and 
a difficult one, see Table [T] that describes the classes used in the experiments. Since 
the GKLS-generator provides functions with known locations of global minima, the 
experiments have been carried out by using the following stopping criteria. 

Stopping criteria. If y* denotes the global minimizer of the i-th function of 
the test class, 1 < i < 100, then the search terminates either when the maximal 
number of trials T^ax, equal to 1000000, was reached or when a trial point falls in a 
ball Bi having a radius p and the center at the global minimizer of the i-th function 
of the class, i.e., 

B, = {yeR^ : \\y - ylW < p}, 1 < z < 100. (4.2) 

All the methods under comparison can execute p{k) > 1 trials in the course of 
each k-th iteration, therefore, when condition fl4.2p is satished at an iteration k* 
the number of trials executed to solve the problem is calculated as YX=iP{^)- The 
radius p from fl4.2p in the stopping rule was hxed equal to 0.01\/]V for classes 1-6 
and 0.02\/]V for classes 7 and 8. 

In order to show the influence of the parameter p introduced in (13.7p on the 
search, a sensitivity analysis has been performed. Two different classes of test 
functions have been considered: class 1 in iV = 2 and class 5 in = 4 (see Tabled]). 
Three different values of the parameter p were used for each class. In Table jj] 
the average and the maximal number of function evaluations calculated in order to 
satisfy the stopping rule for all 100 functions of each class are reported. Notice that 
the best results (shown in bold) were obtained for p = 10“^ for the class 1, and for 
p = 10“^° for the class 5. In the case of dimension N = 2, values of p smaller than 
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Class 

Average number of trials 

Maximal number of trials 


DIRECT 

LBDirect 

MGAS 

DIRECT 

LBDirect 

MGAS 

1 

208.54 

304.28 

174.24 

1159 

2665 

565 

2 

1081.42 

1291.70 

622.60 

3201 

4245 

1749 

3 

1140.68 

1893.02 

1153.64 

13369 

20779 

5267 

4 

>42334.36 

5245.72 

2077.60 

1000000(4) 

32603 

9809 

5 

>47768.28 

21932.94 

9961.70 

1000000(4) 

179383 

95467 

6 

>95908.99 

74193.53 

21687.76 

1000000(7) 

372633 

319493 

7 

>33878.09 

31955.06 

7306.04 

1000000(3) 

146623 

36819 

8 

>149578.61 

>93876.77 

23460.00 

1000000(13) 

1000000(1) 

96287 


Table 3: Results of experiments 


10“^ produce an intensification of the search in subintervals already well-explored. 
Conversely, when the dimension N increases the reduced function in one dimension 
becomes more oscillating and it is necessary to reduce the value of the parameter r]. 
In general, if a too small value of the parameter is applied, the algorithm continues 
the search in parts of the domain that were already well-explored during the previous 
iterations. Obviously, if a too large value of p is used it happens that from a certain 
iteration onward, no interval is selected for the subdivision and the global solution 
can be lost. 

Taking into account results of the sensitivity analysis, the following values have 
been chosen: p = 10“^ for classes 1 and 2, tj = 10“^ for the class 3, p = 10“^ for the 
class 4 and rj = 10“^° for the classes 5-8. 

In the algorithm MGAS, an M-approximation of the Peano curve has been con¬ 
sidered. In particular the level M of the curve must be chosen taking in mind the 
constraint NM < G, where N is the dimension of the problem and G is the number 
of digits in the mantissa depending on the computer that is used for the implemen¬ 
tation (see [34] for more details). In our experiments we had G = 52, thus the value 
M = 10 has been used for all the classes of test functions. 

Results of numerical experiments with the eight GKLS test classes from Table [T] 
are shown in Table El The columns “Average number of trials” in Table E] report the 
average number of trials performed during minimization of the 100 functions from 
each GKLS class. The simbol “>” reflects the situations when not all functions of 
a class were successfully minimized by the method under consideration: that is the 
method stopped when 1000000 trials had been executed during minimizations of 
several functions of this particular test class. In these cases, the value 1000000 was 
used in calculations of the average value, providing in such a way a lower estimate of 
the average. The columns “Maximal number of trials” report the maximal number of 
trials required for satisfying the stopping rule for all 100 functions of the class. The 
notation “1000000 (j)” means that after 1000000 function evaluations the method 
under consideration was not able to solve j problems. 

Table E] reports the ratio between the maximal (and average) number of trials 
performed by DIREGT and LBDirect with respect to the corresponding number of 
trials performed by the new algorithm MGAS. It can be seen from Table E] that 
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the algorithm MGAS outperforms both competitors signihcantly on the given test 
classes. 


Class 

Average number of trials 

Maximal number of trials 


DIRECT/MG AS 

LBDirect/MGAS 

DIRECT/MG AS 

LBDirect/MGAS 

1 

1.20 

1.75 

2.05 

4.72 

2 

1.74 

2.07 

1.83 

2.43 

3 

0.99 

1.64 

2.54 

3.95 

4 

>20.38 

2.52 

>101.95 

3.32 

5 

>4.80 

2.20 

>10.47 

1.88 

6 

>4.42 

3.42 

>3.13 

1.17 

7 

>4.64 

4.37 

>27.16 

6.82 

8 

>6.38 

>4.00 

>10.39 

>10.39 


Table 4: Speed up obtained by MGAS with respect to its competitors 


Figure Ho] shows a comparison of the three methods using the so called operating 
characteristics introduced in 1978 in [H] (see, e.g., |10] for their English-language 
description). These characteristics show very well the performance of algorithms 
under the comparison for each class of test functions. On the horizontal axis we have 
the number of function evaluations and the vertical coordinate of each curve shows 
how many problems have been solved by one or another method after executing 
the number of function evaluations corresponding to the horizontal coordinate. For 
instance, the hrst graph in the right-hand column {N = 2, class 2) shows that 
after 1000 function evaluations the LBDirect has found the global solution at 33 
problems, DIRECT at 47 problems, and the MGAS at 84 problems. Thus, the 
behavior of an algorithm is better if its characteristic is higher than characteristics 
of its competitors. In Figure [TOl the left-hand column of characteristics, the behavior 
of algorithms MGAS, DIRECT, and LBDirect on the classes 1, 3, 5, and 7 is shown. 
The right-hand column presents the situation when the more difficult classes 2, 4, 
6, and 8 have been used. 

5 A brief conclusion 

The global optimization problem of a multi-dimensional, non-differentiable, and 
multiextremal function has been considered in this paper. It was supposed that the 
objective function can be given as a ‘black-box’ and the only available information 
is that it satisfies the Lipschitz condition with an unknown Lipschitz constant over 
the search region being a hyperinterval in R^. 

A new deterministic global optimization algorithm called MGAS has been pro¬ 
posed. It uses the following two ideas: the MGAS applies numerical approximations 
to space-filling curves to reduce the original Lipschitz multi-dimensional problem to 
a univariate one satisfying the Holder condition; the MGAS at each iteration uses a 
new geometric technique working with a number of possible Holder constants chosen 
from a set of values varying from zero to infinity evolving so ideas of the popular 
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Figure 10: Operating characteristics for the methods MGAS, DIRECT, and LBDi- 
rect for the eight classes from Table\^ The left-hand column presents results for the 
simple classes 1,3,5,7, from top to bottom. The right-hand column shows results for 
the difficult classes 2,4,6,8. 22 














































DIRECT method to the held of Holder global optimization. Convergence conditions 
of the MGAS have been established. Nnmerical experiments carried out on 800 of 
test functions generated randomly have been executed. 

It can be seen from the numerical experiments that the new algorithm shows 
quite a promising performance in comparison with its competitors. Moreover, the 
advantage of the new technique becomes more pronounced for harder problems. 
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